منابع مشابه
Second - level Instruction Cache Thread Processing Unit Thread Processing Unit Thread Processing Unit Instruction Cache First - level First - level First - level Instruction Cache Instruction Cache Execution
This paper presents a new parallelization model, called coarse-grained thread pipelining, for exploiting speculative coarse-grained parallelism from general-purpose application programs in shared-memory multiprocessor systems. This parallelization model, which is based on the ne-grained thread pipelining model proposed for the superthreaded architecture 11, 12], allows concurrent execution of l...
متن کاملImproving Inter-thread Data Sharing with GPU Caches
The massive amount of fine-grained parallelism exposed by a GPU program makes it difficult to exploit shared cache benefits even there is good program locality. The non deterministic feature of thread execution in the bulk synchronize parallel (BSP) model makes the situation even worse. Most prior work in exploiting GPU cache sharing focuses on regular applications that have linear memory acces...
متن کاملParallel Data Sharing in Cache: Theory, Measurement and Analysis
Cache sharing on a multicore processor is usually competitive. In multi-threaded code, however, different threads may access the same data and have a cooperative effect in cache. This retport describes a new metric called shared footprint and a new locality theory to measure and analyze parallel data sharing in cache. Shared footprint is machine independent, i.e. data sharing in all cache sizes...
متن کاملCache-Fair Thread Scheduling for Multicore Processors
We present a new operating system scheduling algorithm for multicore processors. Our algorithm reduces the effects of unequal CPU cache sharing that occur on these processors and cause unfair CPU sharing, priority inversion, and inadequate CPU accounting. We describe the implementation of our algorithm in the Solaris operating system and demonstrate that it produces fairer schedules enabling be...
متن کاملSharing and Contention in Coherent - cache Parallel
Parallel graph reduction is a model for parallel program execution in which shared memory is used under a strict access regime with single assignment and blocking reads. We present the design of an eecient and accurate multiprocessor simulation scheme and the results of a simulation study of the pattern of access of a suite of benchmark programs.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM SIGPLAN Notices
سال: 2017
ISSN: 0362-1340,1558-1160
DOI: 10.1145/3155284.3018759